On evaluating brain tissue classifiers without a ground truth.

نویسندگان

  • Sylvain Bouix
  • Marcos Martin-Fernandez
  • Lida Ungar
  • Motoaki Nakamura
  • Min-Seong Koo
  • Robert W McCarley
  • Martha E Shenton
چکیده

In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating Automatic Brain Tissue Classifiers

We present a quantitative evaluation of MR brain images segmentation. Five classifiers were tested. The task was to classify an MR image into four different classes: background, cortical spinal fluid, gray matter and white matter. The performance was rated by first estimating a ground truth (EGT) using STAPLE and then analyzing the volume differences as well as the Dice similarity measure betwe...

متن کامل

Effect of Errors in Ground Truth on Classification Accuracy

The effect of errors in ground truth on the estimated thematic accuracy of a classifier is considered. A relationship is derived between the true accuracy of a classifier relative to ground truth without errors, the actual accuracy of the ground truth used, and the measured accuracy of the classifier as a function of the number of classes. We show that if the accuracy of the ground truth is kno...

متن کامل

Evaluating Classifiers Without Expert Labels

This paper considers the challenge of evaluating a set of classifiers, as done in shared task evaluations like the KDD Cup or NIST TREC, without expert labels. While expert labels provide the traditional cornerstone for evaluating statistical learners, limited or expensive access to experts represents a practical bottleneck. Instead, we seek methodology for estimating performance of the classif...

متن کامل

Two Methods for Validating Brain Tissue Classifiers

In this paper, we present an evaluation of seven automatic brain tissue classifiers based on level of agreements. A number of agreement measures are explained, and we show how they can be used to compare different segmentation techniques. We use the Simultaneous Truth and Performance Level Estimation (STAPLE) of Warfield et al. but also introduce a novel evaluation technique based on the Willia...

متن کامل

Exploiting Semantic Relatedness Measures for Multi-label Classifier Evaluation

In the multi-label classification setting, documents can be labelled with a number of concepts (instead of just one). Evaluating the performance of classifiers in this scenario is often as simple as measuring the percentage of correctly assigned concepts. Classifiers that do not retrieve a single concept existing in the ground truth annotation are all considered equally poor. However, some clas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • NeuroImage

دوره 36 4  شماره 

صفحات  -

تاریخ انتشار 2007